Tracing signs

From philosophy to linguistics via data science
(and back)

Joshua Wilson Black

Te Kāhui Roro Reo | New Zealand Institute of Language, Brain and Behaviour
Te Whare Wānanga o Waitaha | University of Canterbury

Overview

  1. Philosophical orientation
  2. Data science methods
  3. Two current research streams:
    1. sociolinguistics, and
    2. experimental history of philosophy
  4. Future directions

Orientation

Philosophical theses

  • (History of) philosophy: Time depth is essential for understanding anything!
  • Pragmatism: We are social animals embedded in an environment. No minds without bodies!
  • Signs: Signs are embodied, come in many forms, and change over time .
  • Ethics of inquiry: inquiry is essentially communal and generates ethical demands. These include maximal openness regarding both methods and data.

Motivating question

How can we trace and explain the dynamics of human signs, understood in their broadest terms?

Data science methods

Data science methods

  • ‘Reading old books and writing about them’ has problems:
    • scale, sampling, demographics, and narrow ‘semiotic range’
  • Data science offers new opportunities:
    • contemporary computational resources,
    • access to large, varied corpora with significant time depth.
  • Challenges remain (but this is fun!)

Reproduction and Generalisation

Stream 1: Sociolinguistics

Tracking vowels

  • Sociolinguistic variables:
    • are signs, and
    • draw attention to the embodiment of sign users within a social environment.
  • Corpora housed at NZILBB offer unique opportunities for tracing change in sociolinguistic variables over the course of a large range of time scales:
    • minutes, years, and decades
  • My research has focused on vowels in terms of:
    • Clustering: how vowels function together.
    • Acquisition: how vocalic variation is picked up by children.

Minutes

  • RQ: Do vowels function together as features of styles in the course of a monologue?
  • Corpus: QuakeBox (Clark et al. 2016; Walsh et al. 2013)
  • Method: divide monologue into intervals → apply Principal Component Analysis (PCA) → explain results using GAMMs.
  • Answer: Apparently not!
  • Unexpected discovery: we found an (overlooked!) influence of amplitude on the vowel space.
  • Data science methods are powerful exploratory tools.
    • The RQ and unexpected discovery differ.
    • Our write up is explicit about this structure.
  • Forthcoming in Linguistics Vanguard (Wilson Black et al. forthcoming)

Years

  • RQ: How do children acquire and advance existing sound changes?
  • RQ: Does this match the standard account (Labov 2001)?
    • Labov assumes transmission of a female parent’s production to their children.
  • Corpus: a longitudinal Christchurch preschool story retell corpus (3;11 - 5;5 years).
  • Data challenges:
  • Result: children start from a more ‘conservative’ place in the New Zealand short vowel shift than expected on Labov’s account.
  • Work in progress with Lynn Clark (PI), Robert Fromont, and Maggie Blackwood.

Decades

  • RQ: How can the methods applied by Brand et al. (2021) be strengthened and generalised?
    • The ‘loadings’ used to cluster vowels can be unstable . But how unstable?
  • Corpora: ONZE, QB1, and QB2.
  • Solution:
    • calculate error bars via bootstrapping, and
    • compare with null distribution generated by permutation.
  • Implementation: an R package (nzilbb.vowels) which provides functions to perform the relevant calculations and (Wilson Black and Brand 2023).
  • Published in Language and Linguistics Compass (2022)

Stream 2: Experimental history of philosophy

An emerging field

  • Experimental history of philosophy is ‘experimental’ relative to philosophy
    • i.e. it includes corpus work.
  • Current work uses:
    • corpus analysis and bibliometrics to track history of key terms and networks of influence (e.g. Betti et al. 2019), and
    • semantic networks to identify gaps in current literature (e.g. Alfano 2019).
  • This is part of a broader shift towards large scale text corpora to investigate human thought (e.g., in psychology, Jackson et al. 2022).

Philosophy in newspapers

  • My work has looked to English-language New Zealand newspapers up to 1900.
  • An interesting time:
    • public engagement with emerging evolutionary thought,
    • establishment of colonial government, and
    • world-view interaction and clashes between Māori and Pākehā.
  • Traditional methods in history of philosophy have found barely any story to tell (e.g. Davies and Helgeby 2014).
  • Two demographic expansions:
    • a geographically remote area, and
    • a more diverse population.

Finding philosophy

  • RQ1: How can philosophical writing be identified in historical newspapers?
    • Problem: ‘needle in the haystack’.
  • RQ2: Can we extract meaningful patterns from extracted texts?
  • Method:
    • Iterative bootstrapping for corpus construction.
    • Cooccurrence networks for extracting patterns in use of words.
  • Results:

Current project

  • A shift from newspapers as text to newspapers as evidence.
  • Why?: whether a text is ‘philosophy’ is contentious. A less contentious approach is to look directly for evidence of intellectual argument and activity.
  • Aim: to
    • identify advertisements for public debates and lectures (not contentious!),
    • extract the who, where, what, and when,
    • classify events by content, and
    • map them geographically and temporally in a publicly accessible interface.
  • I am currently producing a pilot study developing these methods using Christchurch newspapers (with seed funding from UC Arts Digital Lab).
  • These project are unprecedented in scale for applications of data science methods to philosophy.

Future directions

Stream 1

  • Current pilot study: PCA as a tool for investigating systematic change over the lifespan between QB1 and QB2.
  • Return to within-speaker stylistic variation.
    • Pilot alternative methods for dividing recordings and clustering variables.
  • Sort out normalisation.
    • Widely used methods are insufficiently supported.
    • Why do normalisation methods struggle with young children?

Stream 2

  • Intellectual life in NZ newspapers
    • Acquire funding to extend the pilot study of public debates and lectures nationwide.
    • Trace the emergence and spread of evolutionary ideas in a ‘bottom-up’ way.
    • Build relationships with Māori researchers and capacity in te reo Māori.
  • Does elite philosophical writing lead or lag public reasoning?
    • History-inclined philosophers often attribute changes in society to changes in elite philosophy.
    • On the other hand ‘The owl of Minerva spreads its wings only with the coming of the dusk.’
    • This relationship is testable in key cases.

Combining streams

  • Newspaper complaints have been used to track changes in New Zealand English by the ONZE project (Gordon et al. 2004).
  • NZILBB holds quite a few newspaper clippings of this sort.
  • Can this approach be scaled?

Tracing signs

  • I apply contemporary data science methods to large corpora to trace changes in human sign use over multiple time scales.
  • I aim at both reproducibility and methodological generalisability through extensive documentation and methods sharing in supplementary material, tutorials, and software.

Ngā mihi nui!

References

Alfano, Mark. 2019. Nietzsche’s Moral Psychology. Cambridge University Press. https://doi.org/10.1017/9781139696555.
Betti, Arianna, Hein van den Berg, Yvette Oortwijn, and Caspar Treijtel. 2019. “History of Philosophy in Ones and Zeros.” In Methodological Advances in Experimental Philosophy. Bloomsbury Academic. https://doi.org/10.5040/9781350069022.ch-011.
Black, Joshua. 2013. “Peirce on Habit, Practice, and Theory: The Priority of Practice and the Autonomy of Theory.” Master’s thesis, University of Waikato.
———. 2017. “Peirce’s Conception of Metaphysics.” PhD thesis, University of Sheffield.
Brand, James, Jen Hay, Lynn Clark, Kevin Watson, and Márton Sóskuthy. 2021. “Systematic Co-Variation of Monophthongs Across Speakers of New Zealand English.” Journal of Phonetics 88 (September): 101096. https://doi.org/10.1016/j.wocn.2021.101096.
Clark, Lynn, Helen MacGougan, Jennifer Hay, and Liam Walsh. 2016. ‘Kia Ora. This Is My Earthquake Story.’ Multiple Applications of a Sociolinguistic Corpus.” Ampersand 3: 13–20. https://doi.org/10.1016/j.amper.2016.01.001.
Davies, Martin, and Stein Helgeby. 2014. “Idealist Origins: 1920S and Before.” In History of Philosophy in Australia and New Zealand, edited by Graham Oppy and Nick Trakakis, 15–54. Springer.
Fromont, Robert, Lynn Clark, Joshua Wilson Black, and Margaret Blackwood. Forthcoming. “Maximizing Accuracy of Forced Alignment for Spontaneous Child Speech.” Language Development Research, Forthcoming.
Gordon, Elizabeth, Campbell Lyle, Hay Jennifer, Maclagan Margaret, Sudbury, and Peter Trudgill. 2004. New Zealand English: Its Origins and Evolution. Cambridge University Press.
Jackson, Joshua Conrad, Joseph Watts, Johann-Mattis List, Curtis Puryear, Ryan Drabble, and Kristen A. Lindquist. 2022. “From Text to Thought: How Analyzing Language Can Advance Psychological Science.” Perspectives on Psychological Science 17 (3): 805–26. https://doi.org/10.1177/17456916211004899.
Labov, William. 2001. Principles of Linguistic Change: Social Factors. Wiley-Blackwell.
Legg, Catherine, and Joshua Black. 2020. “What Is Intelligence for? A Peircean Pragmatist Response to the Knowing-How, Knowing-That Debate.” Erkenntnis 87 (5): 2265–84. https://doi.org/10.1007/s10670-020-00301-9.
Walsh, Liam, Jen Hay, Derek Bent, Jeanette King, Paul Millar, Viktoria Papp, and Kevin Watson. 2013. “The UC QuakeBox Project: Creation of a Community-Focused Research Archive.” New Zealand English Journal 27: 20–32.
Wilson Black, Joshua. 2022a. “Model Check and Significance Testing for Vowel Space GAMMs.” 2022. https://joshua.wilsonblack.nz/post/significance-testing-for-vowel-space-gamms/.
———. 2022b. “Supplementary Material for "Creating Specialized Corpora from Digitized Historical Newspaper Archives".” 2022. https://doi.org/10.17605/OSF.IO/7CRGT.
———. 2022c. “Visualising Vowel Space Change with GAMMs.” Linguistics Methods Hub. October 28, 2022. https://doi.org/10.5281/zenodo.7261966.
———. 2022d. “Creating Specialized Corpora from Digitized Historical Newspaper Archives.” Digital Scholarship in the Humanities 38 (2): 779–97. https://doi.org/10.1093/llc/fqac079.
———. 2023. “Peirce on Metaphysics and Commonsense Belief.” In Pragmatic Reason, 195–210. Routledge. https://doi.org/10.4324/9781003165699-13.
Wilson Black, Joshua, and James Brand. 2023. “Nzilbb.vowels: Functions for Vowel Covariation Studies (0.1.0-Alpha).” https://doi.org/10.5281/zenodo.8303224.
Wilson Black, Joshua, James Brand, Jen Hay, and Lynn Clark. 2022. “Using Principal Component Analysis to Explore Co-Variation of Vowels.” Language and Linguistics Compass 17 (1). https://doi.org/10.1111/lnc3.12479.
Wilson Black, Joshua, Jen Hay, Lynn Clark, and James Brand. forthcoming. “The Overlooked Effect of Amplitude on Within-Speaker Vowel Variation.” Linguistics Vanguard, forthcoming.
———. 2022a. “Supplementary Material for "the Overlooked Effect of Amplitude on Within-Speaker Vowel Covariation".” 2022. https://nzilbb.github.io/amp_f1_public/.
———. 2022b. “Supplementary Material for "Using Principal Component Analysis to Explore Co-Variation of Vowels".” 2022. https://nzilbb.github.io/PCA_method_supplementary/pca_method_supplementary.html .